0. Import Libraries and Install Pre-Requistes

1. Dataset Overview

2. Fill NaNs

3. Data Normalization

To visualise the distribution of the variables, we plot a histogram and a Q-Q plot in which if the variable is normally distributed, the values of the variable should fall in a 45 degree line when plotted against the theoretical quantiles.

3.1 Univariate Normal Distribution

We are going to use 2 common approaches to make data more Gaussian-like:

3.1.1 Plot Data Distribution Yeo-Johnson

After the data has been transformed by using some scaling & transformation techniques we conclude that are not appropiate for handling outliers hence we are going to apply different techniques as part of multivariate normal analysis

By using the functions out_std and out_iqr we obtain the potential outlier labels according to each transformation. The value 1 classifies an observation as a potential outlier as long as at least one variable has an abnormal value.

We define abnormal values has the ones outside the 2.5 x Interquantile Range (out_iqr) and the ones more than 4 Standard Deviations away from the mean (out_std). We defined this more unrestrictive criteria since the data already had high variation by nature.

3.2 Multivariate Normal Distribution

3.2.1 Multivariate Normal Distribution Visualization before Outliers Remotion

3.2.2 Multivariate Normal Distribution Visualization after Outliers Remotion

3.3 Categorical Outliers

3.4 UMAP for Initial Data Distribution

3.4.1 UMAP for Dataset Before Normalization

3.4.2 UMAP for Dataset After Normalization

4. Dimensionality Reduction

4.1 Metric features

4.1.1 PCA

4.1.2 List of Most Important Features

4.2 Non-Metric Features

4.2.1 MFA & FAMD

4.3 PhiK Correlation

Features Taken: RFA_2F, RFA_2A
Features Taken: IC3, HHD2, ETHC3
Features Taken: INCOME, N_ODATEDW, NUMCHLD, AGE
Features Taken: PETS
Features Taken: N_RDATE_7, RAMNT_7
Features Taken: FEDGOV, VIETVETS
Features Taken: N_ADATE_7
Features Taken: MBBOOKS, MBGARDEN
Features Taken: N_FISTDATE, MAXRAMNT, N_LASTDATE, N_MAXRDATE
Features Taken: CARDPROM

5. Clustering

5.1 K-Means

5.1.1 Kelbowplot Visualizer

5.1.2 Silohuette Visualizer

5.1.3 KMeans

5.1.4 UMAP for Kmeans

Another way to look into the K-Means Clusters

From the graph above we can conclude that the 8 clusters are clearly distinguishable by using the "deep" color palette

5.1.5 Intercluster Distance for K-Means

Intercluster distance maps display an embedding of the cluster centers in 2 dimensions with the distance to other centers preserved, the closer to centers are in the visualization, the closer they are in the original feature space. The clusters are sized according to a scoring metric this gives a sense of the relative importance of clusters however, that because two clusters overlap in the 2D space, it does not imply that they overlap in the original feature space

5.2 K-Prototypes

5.2.1 Clustering

5.2.2 UMAP for K-Prototypes

5.2.3 Intercluster Distance for K-Prototypes

5.3 Evaluation by Classification using LGBM

Validating the quality of clusters by treating them as labels and building a classification model on top. If the clusters are of high quality, the classification model will be able to predict them with high accuracy. At the same time, the models should use a variety of features to ensure that the clusters are not too simplistic.

LightGBM will be used as classifier as it can use categorical features

5.3.1 K-Means

5.3.2 K-Prototypes

Classifiers for both of the clustering methods have F1 score close to 1 which means that K-Means and K-prototypes have produced clusters that are easily distinguishable. Yet, to classify the K-Prototypes correctly, LightGBM uses more features, and some of the categorical features become important. This is in contrast to K-Means which could have been almost perfectly classified using just 15-18 features. This proves that the clusters produced by K-Prototypes are more informative.

5.4 K-Means & Hierarchical Clustering

5.4.1 Encondig Categorical Features

We have only 2 categorical features with limited data values (4 and 2), so we can do encoding here to further utilization of the full list of features upfront

5.4.2 Clustering and Functions Definition

From the graph above R2 for hierarchical clustering we can conclude that the best HCM method is ward hence this will be taken for the upcoming analysis

5.5 SOM & Hierarchical Clustering

From the graph above R2 for hierarchical clustering we can conclude that the best HCM method is ward hence this will be taken for the upcoming analysis

5.6 SOM & K-Means

5.7 Gaussian Mixture Model

5.8 DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

5.9 Accuracy Metrics

6. Marketing Approach